Version 2

  1. Split Stratified K-fold folds
  2. Define sets of model parameters values to evaluate
  3. For each k-fold resampling iteration DO
    1. For each parameter set in grid search DO
      1. Hold-out 1/k samples/fold
      2. Pre-Process Data (Create functions on training set, apply to test set with same)
        1. Impute data (median)
        2. Scale features (x_i - mean))/std
        3. Perform any univariate feature selection (remove very low variation features)
        4. Modeling feature selection (ExtraTreesClassifier)
      3. Fit the model on the k/K training fold
      4. Predict the hold-out samples/fold
    2. END
    3. Calculate the average performance across hold-out predictions
  4. END
  5. Determine the optimal parameter set from all K-folds
  6. Fit the final model to all training data using the optimal parameter set

In [ ]: